218 research outputs found

    Testing statistical hypothesis on random trees and applications to the protein classification problem

    Full text link
    Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov--Smirnov-type goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford--Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton--Watson related processes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS218 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Castillejo’s sonnet

    Get PDF
    Discutimos la autoría del soneto con primer verso «Si las penas que dais son verdaderas», por siglos atribuido sin dudas a Cristóbal de Castillejo pero publicado recientemente por varios autores como escrito por Juan Boscán. Pensamos que esta última atribución es errónea, quizás motivada por el título «Soneto de Boscán» que Velasco, editor en 1573 de la obra de Castillejo, le puso a este soneto. La comparación de dos versiones de la obra de Castillejo que incluyen este soneto nos permite apreciar cuán fino poeta era el autor, y cuán escrupuloso en sus revisiones.We discuss the authorship of the sonnet whose first line is «Si las penas que dais son verdaderas». For centuries it was undoubtedly attributed to Cristóbal de Castillejo but recently it has been published by several authors as written by Juan Boscán.  We think that this final allocation is erroneous and it was perhaps motivated by the title "Soneto de Boscán" that Velasco, editor of the work of Castillejo, put to this sonnet in 1573. The comparison of   two versions of the work of Castillejo, both including  this sonnet,  allows us to appreciate what  a fine poet the author was, and how  careful he was with his  revisions.

    SnailVis: a Paradigm to Visualize Complex Networks

    Get PDF
    We propose a new non-parametric and linear-complexity algorithm to visualize complex networks, which were previously decomposed in subsets according to some criteria. We show two representations: the first including all edges and vertices and the second, summarized, highlighting subsets and their relations. In this paper we use a community decomposition algorithm to generate the subsets; then we rank them by the number of inter-community connections. We also highlight the central core of each community, that is, the subset with the highest connectivity level, which is the kmax-core of the k-core decomposition.Sociedad Argentina de Informática e Investigación Operativ

    Obtaining Communities with a Fitness Growth Process

    Full text link
    The study of community structure has been a hot topic of research over the last years. But, while successfully applied in several areas, the concept lacks of a general and precise notion. Facts like the hierarchical structure and heterogeneity of complex networks make it difficult to unify the idea of community and its evaluation. The global functional known as modularity is probably the most used technique in this area. Nevertheless, its limits have been deeply studied. Local techniques as the ones by Lancichinetti et al. and Palla et al. arose as an answer to the resolution limit and degeneracies that modularity has. Here we start from the algorithm by Lancichinetti et al. and propose a unique growth process for a fitness function that, while being local, finds a community partition that covers the whole network, updating the scale parameter dynamically. We test the quality of our results by using a set of benchmarks of heterogeneous graphs. We discuss alternative measures for evaluating the community structure and, in the light of them, infer possible explanations for the better performance of local methods compared to global ones in these cases

    Association of candidate gene polymorphisms with clinical subtypes of preterm birth in a Latin American population

    Get PDF
    Background. Preterm birth (PTB) is the leading cause of neonatal mortality and morbidity. PTB is often classified according to clinical presentation: Idiopathic (PTB-I), preterm premature rupture of membranes (PTB-PPROM), and medically induced (PTBM). The aim of this study was to evaluate the associations between specific candidate genes and clinical subtypes of PTB. Methods. 24 SNPs were genotyped in 18 candidate genes in 709 infant triads. Of them, 243 were PTB-I, 256 PTB-PPROM, and 210 PTB-M. These data were analyzed with a Family-Based Association. Results. PTB was nominally associated with rs2272365 in PON1, rs883319 in KCNN3, rs4458044 in CRHR1, and rs610277 in F3. Regarding clinical subtypes analysis, 3 SNPs were associated with PTB-I (rs2272365 in PON1, rs10178458 in COL4A3, and rs4458044 in CRHR1), rs610277 in F3 was associated with PTBPPROM, and rs883319 in KCNN3 and rs610277 in F3 were associated with PTB-M. Conclusions. Our study identified polymorphisms potentially associated with specific clinical subtypes of PTB in this Latin American population. These results could suggest a specific role of such genes in the mechanisms involved in each clinical subtype. Further studies are required to confirm our results and to determine the role of these genes in the pathophysiology of clinical subtypes

    The ocean sampling day consortium

    Get PDF
    Ocean Sampling Day was initiated by the EU-funded Micro B3 (Marine Microbial Biodiversity, Bioinformatics, Biotechnology) project to obtain a snapshot of the marine microbial biodiversity and function of the world’s oceans. It is a simultaneous global mega-sequencing campaign aiming to generate the largest standardized microbial data set in a single day. This will be achievable only through the coordinated efforts of an Ocean Sampling Day Consortium, supportive partnerships and networks between sites. This commentary outlines the establishment, function and aims of the Consortium and describes our vision for a sustainable study of marine microbial communities and their embedded functional traits

    A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction

    Get PDF
    The developmental and physiological complexity of the auditory system is likely reflected in the underlying set of genes involved in auditory function. In humans, over 150 non-syndromic loci have been identified, and there are more than 400 human genetic syndromes with a hearing loss component. Over 100 non-syndromic hearing loss genes have been identified in mouse and human, but we remain ignorant of the full extent of the genetic landscape involved in auditory dysfunction. As part of the International Mouse Phenotyping Consortium, we undertook a hearing loss screen in a cohort of 3006 mouse knockout strains. In total, we identify 67 candidate hearing loss genes. We detect known hearing loss genes, but the vast majority, 52, of the candidate genes were novel. Our analysis reveals a large and unexplored genetic landscape involved with auditory function

    Relaxation of Adaptive Evolution during the HIV-1 Infection Owing to Reduction of CD4+ T Cell Counts

    Get PDF
    Background: the first stages of HIV-1 infection are essential to establish the diversity of virus population within host. It has been suggested that adaptation to host cells and antibody evasion are the leading forces driving HIV evolution at the initial stages of AIDS infection. in order to gain more insights on adaptive HIV-1 evolution, the genetic diversity was evaluated during the infection time in individuals contaminated by the same viral source in an epidemic cluster. Multiple sequences of V3 loop region of the HIV-1 were serially sampled from four individuals: comprising a single blood donor, two blood recipients, and another sexually infected by one of the blood recipients. the diversity of the viral population within each host was analyzed independently in distinct time points during HIV-1 infection.Results: Phylogenetic analysis identified multiple HIV-1 variants transmitted through blood transfusion but the establishing of new infections was initiated by a limited number of viruses. Positive selection (d(N)/d(S)>1) was detected in the viruses within each host in all time points. in the intra-host viruses of the blood donor and of one blood recipient, X4 variants appeared respectively in 1993 and 1989. in both patients X4 variants never reached high frequencies during infection time. the recipient, who X4 variants appeared, developed AIDS but kept narrow and constant immune response against HIV-1 during the infection time.Conclusion: Slowing rates of adaptive evolution and increasing diversity in HIV-1 are consequences of the CD4+ T cells depletion. the dynamic of R5 to X4 shift is not associated with the initial amplitude of humoral immune response or intensity of positive selection.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Fed Univ Para, Inst Biotechnol, BR-66059 Belem, Para, BrazilUniv São Paulo, Inst Trop Med, São Paulo, SP, BrazilCDC, Ctr Dis Control & Prevent, Branch Lab, Atlanta, GA 30333 USAUniv Calif San Francisco, Dept Lab Med, San Francisco, CA 94143 USABlood Syst Res Inst, San Francisco, CA USABlood Syst Inc, San Francisco, CA USAUniversidade Federal de São Paulo, São Paulo, BrazilUniversidade Federal de São Paulo, São Paulo, BrazilFAPESP: 07/52841-8Web of Scienc
    • …
    corecore